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We study regression models for the situation where both dependent and independent variables 
are square-integrable stochastic processes. Questions concerning the definition and existence of 
the corresponding functional linear regression models and some basic properties are explored 
for this situation. We derive a representation of the regression parameter function in terms of 
the canonical components of the processes involved. This representation establishes a connec- 
tion between functional regression and functional canonical analysis and suggests alternative 
approaches for the implementation of functional linear regression analysis. A specific procedure 
for the estimation of the regression parameter function using canonical expansions is proposed 
and compared with an established functional principal component regression approach. As an 
example of an application, we present an analysis of mortality data for cohorts of medflies, 
obtained in experimental studies of aging and longevity. 
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1. Introduction 

With the advancement of modern technology, data sets which contain repeated measure- 
ments obtained on a dense grid are becoming ubiquitous. Such data can be viewed as a 
sample of curves or functions and are referred to as functional data. We consider here 
the extension of the hnear regression model to the case of functional data. In this exten- 
sion, both predictors and responses arc random functions rather than random vectors. 
It is well known (Ramsay and Dalzell (1991); Ramsay and Silverman (2005)) that the 
traditional linear regression model for multivariate data, defined as 

Y = ao + X/3o + e, (1) 
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may be extended to the functional setting by postulating the model, for s G Ti , t G T2 , 



Writing all vectors as row vectors in the classical model (1), Y and e arc random 
vectors in R''^ , X is a random vector in R''^ , and olq and (3^ arc, respectively, 1 x p2 
and pi X p2 matrices containing the regression parameters. The vector e has the usual 
interpretation of an error vector, with E[e\ = and cov[e] = cr^/. / denoting the identity 
matrix. In the functional model (2), random vectors X, Y and e in (1) arc replaced by 
random functions defined on the intervals Ti and T2 ■ The extension of the classical linear 
model (1) to the functional linear model (2) is obtained by replacing the matrix operation 
on the right-hand side of (1) with an integral operator in (2). In the original approach 
of Ramsay and Dalzell (1991), a penalized least-squares approach using L-splines was 
adopted and applied to a study in temperature-precipitation patterns, based on data 
from Canadian weather stations. 

The functional regression model (2) for the case of scalar responses has attracted 
much recent interest (Cardot and Sarda (2005); Miiller and Stadtmiiller (2005); Hall and 
Horowitz (2007)); while the case of functional responses has been much less thoroughly 
investigated (Ramsay and Dalzell (1991); Yao, Miiller and Wang (2005b)). Discussions on 
various approaches and estimation procedures can be found in the insightful monograph 
of Ramsay and Silverman (2005). In this paper, we propose an alternative approach to 
predict Y(-) from X{-)^ by adopting a novel canonical representation of the regression 
parameter function /3o {s,t). Several distinctive features of functional linear models emerge 
in the development of this canonical expansion approach. 

It is well known that in the classical multivariate linear model, the regression slope 
parameter matrix is uniquely determined by /3q = cov(X)~^ cov(X, Y), as long as the 
covariance matrix cov(X) is invcrtible. In contrast, the corresponding parameter function 
/3o(-, Oj appearing in (2), is typically not identifiable. This identifiability issue is discussed 
in Section 2. It relates to the compactness of the covariance operator of the process X 
which makes it non-invertible. In Section 2, we demonstrate how restriction to a subspace 
allows this problem to be circumvented. Under suitable restrictions, the components of 
model (2) are then well defined. 

Utilizing the canonical decomposition in Theorem 3.3 below leads to an alternative ap- 
proach to estimating the parameter function /3o(-,-)- The canonical decomposition links 
Y and X through their functional canonical correlation structure. The corresponding 
canonical components form a bridge between canonical analysis and linear regression 
modeling. Canonical components provide a decomposition of the structure of the depen- 
dency between Y and X and lead to a natural expansion of the regression parameter 
function /3o(-, thus aiding in its interpretation. The canonical regression decomposition 
also suggests a new family of estimation procedures for functional regression analysis. We 
refer to this methodology as functional canonical regression analysis. Classical canonical 
correlation analysis (CCA) was introduced by Hotelling (1936) and was connected to 
function spaces by Hannan (1961). Substantial extensions and connections to reproduc- 





Regression via canonical analysis 



707 



ing kernel Hilbert spaces were recently developed in Eubank and Hsing (2008); for other 
recent developments see Cupidon et al. (2007). 

Canonical correlation is known not to work particularly well for very high-dimensional 
multivariate data, as it involves an inverse problem. Lcurgans, Moyced and Silverman 
(1993) tackled the difficult problem of extending CCA to the case of infinite-dimensional 
functional data and discussed the precarious regularization issues which are faced; He, 
Miiller and Wang (2003, 2004) further explored various aspects and proposed practi- 
cally feasible regularization procedures for functional CCA. While CCA for functional 
data is worthwhile, but difficult to implement and interpret, the canonical approach 
to functional regression is here found to compare favorably with the well established 
principal-component-based regression approach in an example of an application (Section 
5). This demonstrates a potentially important new role for canonical decompositions in 
functional regression analysis. The functional linear model (2) includes the varying coeffi- 
cient linear model studied in Hoover et al. (1998) and Fan and Zhang (2000) as a special 
case, where P{s,t) = /3{t)St{s); here, St{-) is a delta function centered at t and P{t) is 
the varying coefficient function. Other forms of functional regression models with vector- 
valued predictors and functional responses were considered by Faraway (1997), Shi, Weiss 
and Taylor (1996), Rice and Wu (2000), Chiou, Miiller and Wang (2003) and Ritz and 
Streibig (2009). 

The paper is organized as follows. Functional canonical analysis and functional linear 
models for L2-proccsscs are introduced in Section 2. Sufficient conditions for the existence 
of functional normal equations are given in Proposition 2.2. The canonical regression 
decomposition and its properties are the theme of Section 3. In Section 4, we propose a 
novel estimation technique to obtain regression parameter function estimates based on 
functional canonical components. The regression parameter function is the basic model 
component of interest in functional linear models, in analogy to the parameter vector in 
classical linear models. The proposed estimation method, based on a canonical regression 
decomposition, is contrasted with an established functional regression method based on 
a principal component decomposition. These methods utilize a dimension reduction step 
to regularize the solution of the inverse problems posed by both functional regression 
and functional canonical analysis. As a selection criterion for tuning parameters, such 
as bandwidths or numbers of canonical components, we use minimization of prediction 
error via leave-one-curve-out cross-validation (Rice and Silverman (1991)). The proposed 
estimation procedures are applied to mortality data obtained for cohorts of medflies 
(Section 5). Our goal in this application is to predict a random trajectory of mortality 
for a female cohort of flies from the trajectory of mortality for a male cohort which 
was raised in the same cage. We find that the proposed functional canonical regression 
method gains an advantage over functional principal component regression in terms of 
prediction error. 

Additional results on canonical regression decompositions and properties of functional 
regression operators are compiled in Section 6. All proofs are collected in Section 7. 
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2. Functional linear regression and the functional 
normal equation 

In this section, wc explore the formal setting as well as identifiability issues for functional 
linear regression models. Both response and predictor functions are considered to come 
from a sample of pairs of random curves. A basic assumption is that all random curves or 
functions are square- integrable stochastic processes. Consider a measure ^ on a real index 
set T and let L2{T) be the class of real- valued functions such that /p|/pd/x<oo. 
This is a Hilbert space with the inner product (/, g) = Jj, fg dfi and we write f = g ii 
Jrp{f — g)^ dfi ~ 0. The index set T can be a set of time points, such as T = {1, 2, . . . , fc}, 
a compact interval T — [a,b] or even a rectangle formed by two intervals Si and 5*2, 
T = Si X 5*2 . We focus on index sets T that are either compact real intervals or compact 
rectangles in and consider /i to be the Lebesgue measure on or R^. Extensions 
to other index sets T and other measures are self-evident. An L2-pi'0cess is a stochastic 
process X = {X{t),t eT}, X € L2iT), with E[\\X\\^] < oo,E[X{t)^] < oo for alH e T. 
Let X e L2(Ti) and Y € ^2(^2). 

Definition 2. 1 . Processes ( X, Y) are subject to a functional linear model if 



where /3o G L2{Ti x T2) is the parameter function, e G i2(72) is a random error process 
with E[e{t)\ = for t € Ti, and e and X are uncorrelated, in the sense that E[X{t)e{s)\ = 
for all s,tGTi. 

Without loss of generality, we assume from now on that all processes considered have 
zero mean functions, EX{t) ~ and EY{s) = for all t, s. We define the regression 
integral operator Cx '■ ^2(^1 x T2) — > L2{T2) by 




(3) 





Equation (3) can then be rewritten as 



(4) 



Denote the auto- and cross-covariance functions of X and Y by 



rxx{s,t) ^ coY[X{s),X{t% 
ryy(s,i):=cov[r(.s),r(t)], 
rxY{s,t) = coy[X {s),Y{t)l 



s,t Cz T2, and 



seTi,teT2. 
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The autocovariance operator of X is the integral operator Rxx :^2('7i) — L2{Ti), de- 
fined by 

{Rxxu)is)= [ rxx{s,t)u{t)dt, u£L2{Ti). 

Replacing rxx by ryy, rxY, we analogously define operators Ryy'-L2{T2) — > ^2(12) 
and RxY '■ L2{T2) L2{Ti) , similarly Ryx- Then Rxx and Ryy are compact, self- 
adjoint and non-negative definite operators, and Rxy and Ryx are compact operators 
(Conway (1985)). We refer to He et al. (2003) for a discussion of various properties of 
these operators. 

Another linear operator of interest is the integral operator Txx '■ L2 {Ti x T2) ^ L2 (Ti x 
T2), 

(rxx/3)(s,i)= / rxx{s,w)P{w,t)dw. (5) 

The operator equation 

rxY=rxxP, f3€L2{TixT2) (6) 

is a direct extension of the least-squares normal equation and may be referred to as the 
functional population normal equation. 

Proposition 2.2. The following statements are equivalent for a function S L2(Ti x 
T2): 

(a) satisfies the linear model (4-); 

(b) /3q is a solution of the functional normal equation ( 6); 

(c) /3o minimizes E\\Y — CxPW"^ among all (3 E L2{Ti x T2) ■ 

The proof is found Section 7. In the infinite-dimensional case, the operator Txx is a 
Hilbert-Schmidt operator in the Hilbert space L2, according to Proposition 6.6 below. 
A problem we face is that it is known from functional analysis that a bounded inverse 
does not exist for such operators. A consequence is that the parameter function /3o in 
(3), (4) is not identifiable without additional constraints. In a situation where the inverse 
of the covariance matrix does not exist in the multivariate case, a unique solution of the 
normal equation always exists within the column space of cov(X) and this solution then 
minimizes — iZjc on that space. Our idea to get around the non-invertibility 

issue in the functional infinite-dimensional case is to extend this approach for the non- 
invertible multivariate case to the functional case. Indeed, as is demonstrated in Theorem 
2.3 below, under the additional Condition (CI), the solution of (6) exists in the subspace 
defined by the range of Txx- This unique solution indeed minimizes i^Hy — CxPW"^ ■ 

We will make use of the Karhunen-Loeve decompositions (Ash and Gardner (1975)) 
for L2-processes X and F, 

00 00 
X{s)=Y,im0-,n{s), seTi and r(t) = ^ ^(^.(i), teT2, (7) 

m=l j=l 
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with random variables ^,„, m, j > 1, and orthonormal families of L2-fmictions 
{Om}m>i and {(Pj}j>i- Here, E(rn = EQ = 0, E^rn^p = XxmSmp, EQCp = XyjSjp and 
{{Xxm:dm)},{i^Yj,'Pj)} are the eigenvalues and eigenfunctions of the covariance oper- 
ators Rxx and Ryy, respectively, with J2m ^Xm < oo, J2j ^Yj < oo. Note that Smj is 
the Kroneckcr symbol with Smj = 1 for m = j, 5m j = for j. 

We consider a subset of L2 on which inverses of the operator Txx can be defined. 
As a Hilbert-Schmidt operator, Txx is compact and therefore not invcrtiblc on L2. 
According to Conway (1985), page 50, the range of Txx, 



Gxx^{Txxh:heL2{TixT2)}, 

is characterized by 

Gxx = l9eL2{T,xT2): A^4|(<7,0™^,)P < 00,5 ± kcr(rxx) I, (8) 

I m,j=l ) 



where ker(rxx) = {h : Txxh ~ 0}. Defining 

G-x\ = \heL2{T^ XT2): h= Xxl{9,0mVj)em.Vj,9eGxxy 

I m,j=l ) 

we find that Txx is a one-to-one mapping from the vector space G^\ C L2{Ti x T2) onto 
the vector space Gxx- Thus, restricting Txx to a subdomain defined by the subspace 
^xx: ' define its inverse for g € Gxx as 

00 

^XX9^ ^x]n{9,Sm^Po)Om^Po- (9) 

mj = l 

then satisfies the usual properties of an inverse, in the sense that T xxT^^g — g 
for all g G Gxx, and T^^-^Txxh = h for all h G G^^^. 

The following Condition (CI) for processes {X,Y) is of interest. 

Condition (CI). The L2 -processes Xand Y with Karhunen-Loeve decompositions (7) 
satisfy 




If (CI) is satisfied, then the solution to the non-invertibility problem as outlined above 
is viable in the functional case, as demonstrated by the following basic result on functional 
linear models. 
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Theorem 2.3 (Basic theorem for functional Unear models). A unique solution of 
the linear model (4) exists in ker{Txx)'^ if and only if X and Y satisfy Condition (CI). 
In this case, the unique solution is of the form 

P*{t,s)^{r^^^rxY){t,s). (10) 

As a consequence of Proposition 2.2, solutions of the functional linear model (4), 
solutions of the functional population normal equation (6) and minimizers of — 
are all equivalent and allow the usual projection interpretation. 

Proposition 2.4. Assume X and Y satisfy Condition (CI). The following are then 
equivalent: 

(a) the set of all solutions of the functional linear model (4); 

(b) the set of all solutions of the population normal equation (6); 

(c) the set of all minimizers of E\\Y — CxPW^ for /3 G L2{Ti x T2); 

(d) the set /3o* + ker(rxx) = {PI + h\h e ^2(^1 x T2),Txxh = 0}. 

It is well known that in a finite-dimensional situation, the linear model (6) always 
has a unique solution in the column space of Txx, which may be obtained by using a 
generalized inverse of the matrix Txx- However, in the infinite-dimensional case, such a 
solution does not always exist. The following example demonstrates that a pair of L2- 
processes does not necessarily satisfy Condition (CI). In this case, the linear model (6) 
does not have a solution. 



Example 2.5. Assume processes X and Y have Karhunen-Loeve expansions (7), where 
the random variables S^m, Cj satisfy 



1 



1 



(11) 



and let 



1 



(7n + l)2(j + l)2 



for m,j> 1. 



(12) 



As shown in He et al. (2003), (11) and (12) can be satisfied by a pair of L2-processes 
with appropriate operators Rxx, Ryy and Rxy ■ Then 



E 



^ i Xxm 



= lim y 

mj"=l 
n 

= lim y 



(m + l)(j + l) 



(m + 1) 



I 00 ^ 



and, therefore. Condition (CI) is not satisfied. 
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3. Canonical regression analysis 

Canonical analysis is a time-honored tool for studying the dependency between the com- 
ponents of a pair of random vectors or stochastic processes; for multivariate stationary 
time series, its utility was established in the work of Brillinger (1985). In this section, 
we demonstrate that functional canonical decomposition provides a useful tool to rep- 
resent functional linear models. The definition of functional canonical correlation for 
L2-pi'ocesses is as follows. 

Definition 3.1. The first canonical correlation pi and weight functions ui and vi for 
L2-processes X and Y are defined as 

Pi= sup cov((u,X),(u,r))=cov((iti,X),(wi,y)), (13) 

ueL2iTi),veL2iT2) 

where u and v are subject to 

var((7.„X)) = l, var((«„r))-l (14) 

for J = 1. The kth canonical correlation pk and weight functions Uk, Vk for processes X 
and Y for k > 1 are defined as 

Pk= sup cov{{u,X),{v,Y}) ^cov{{uk,X),{vk,Y)), 

ueL2{Ti),veL2(T2) 

where u and v are subject to (14) for j ~k and 

cov( {uk , X) , (uj , X) ) = 0, cov( , Y) , {vj , y) ) - 

for J = l,...,fc — 1. We refer to Uk = {uk,X) and Vk — (vk^Y) as the kth canonical 
variates and to {pk,Uk,Vk,Uk,Vk) as the kth canonical components. 

It has been shown in He et al. (2003) that canonical correlations do not exist for all 
L2-processes, but that Condition (C2) below is sufficient for the existence of canonical 
correlations and weight functions. We remark that Condition (C2) implies Condition 
(CI). 

Condition (C2). Let X and Y be L2-processes, with Karhunen-Loeve decompositions 
(7) satisfying 




The proposed functional canonical regression analysis exploits features of functional 
principal components and of functional canonical analysis. In functional principal com- 
ponent analysis, one studies the structure of an L2-pi'0cess via its decomposition into the 
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eigenfunctions of its autocovariance operator, the Karhunen-Loeve decomposition (Rice 
and Silverman (1991)). In functional canonical analysis, the relation between a pair of 
L2-processes is analyzed by decomposing the processes into their canonical components. 
The idea of canonical regression analysis is to expand the regression parameter func- 
tion in terms of functional canonical components for predictor and response processes. 
The canonical regression decomposition (Theorem 3.3) below provides insights into the 
structure of the regression parameter functions and not only aids in the understanding 
of functional linear models, but also leads to promising estimation procedures for func- 
tional regression analysis. The details of these estimation procedures will be discussed 
in Section 4. We demonstrate in Section 5 that these estimates can lead to competitive 
prediction errors in a finite-sample situation. 

Wc now state two key results. The first of these (Theorem 3.2) provides the canonical 
decomposition of the cross-covariance function of processes X and Y . This result plays 
a central role in the solution of the population normal equation (6). This solution is 
referred to as canonical regression decomposition and it leads to an explicit representation 
of the underlying regression parameter function fi^i-,-) of the functional linear model 
(4). The decomposition is in terms of functional canonical correlations pj and canonical 
weight functions Uj and Vj. Given a predictor process X{t), we obtain, as a consequence, 
an explicit representation for E{Y{t)\X) = (£x/3o)(^)j where Cx is as in (4). For the 
following main results, we refer to the definitions of pj, Uj, Vj, Uj, Vj in Definition 3.1. 
All proofs arc found in Section 7. 

Theorem 3.2 (Canonical decomposition of cross-covariance function). Assume 
that L2-processes X and Y satisfy Condition (C2). The cross-covariance function rxY 
then allows the following representation in terms of canonical correlations pj and weight 
functions Uj and Vj : 

oo 

rxY{s,t) = ^ PniRxXUrn{s)RYYV„i{t). (15) 
rn—1 

Theorem 3.3 (Canonical regression decomposition). Assume that the L2-processes 
X andY satisfy Condition (C2). One then obtains, for the regression parameter function 
/3q(-,-) (10), the following explicit solution: 

oo 

{s)RYYVm{t). (16) 

m— 1 

To obtain the predicted value of the response process we use the hnear predictor 

CXD 

Y*{t)^E{Y{t)\X) - {Cxf3om = PmUmRYYVm{t). (17) 

m— 1 

This canonical regression decomposition leads to approximations of the regression pa- 
rameter function /3q and the predicted process Y*{t) = CxI3q via a finitely truncated 
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version of the canonical expansions (16) and (17). The following result provides ap- 
proximation errors incurred from finite truncation. Thus, we have a vehicle to achieve 
practically feasible estimation of /3q and associated predictions Y* (Section 4). 

Theorem 3.4. For K >\, let j3*^{s,t) = ^^^^ PkUk{s)RYYVk{t) be the finitely trun- 
cated version of the canonical regression decomposition (16) for /Sq and define Y^(t) = 
{CxP*K){t). Then, 

K 

fe=i 

with E\Y^] =0. Moreover, 

oo 

E\\Y*-Y^f= plWRvYVkf^Q asK^^ 

k=K+l 

and 

K 

E\\Y Y*^r = E\\Yf EWCxPir = tracc(i?yy) - ^ p2p^^„^|,2_ (^g) 

k=l 

In finite-sample implementations, to be explored in the next two sections, truncation 
as in (18) is a practical necessity; this requires a choice of suitable truncation parameters. 

4. Estimation procedures 
4.1. Preliminaries 

Estimating the regression parameter function and obtaining fitted processes from the 
linear model (2) based on a sample of curves is central to the implementation of functional 
linear models. In practice, data are observed at discrete time points and we temporarily 
assume, for simplicity, that the A^^: time points arc the same for all observed predictor 
curves and are equidistantly spaced over the domain of the data. Analogous assumptions 
are made for the Ny time points where the response curves are sampled. Thus, the original 
observations arc {Xi,Yi), i ~ 1, . . . ,n, where Xi is an iVj^-dimensional vector sampled at 
time points Sj, and Yi is an A^y-dimcnsional vector sampled at time points tj. We assume 
that Nx and Ny are both large. Without going into any analytical details, we compare 
the finite-sample behavior of two functional regression methods, one of which utilizes the 
canonical decomposition for regression and the other a well established direct principal 
component approach to implement functional linear regression. 

The proposed practical version of functional regression analysis through functional 
canonical regression analysis (FCR) is discussed in Section 4.2. This method is com- 
pared with a more standard functional linear regression implementation that is based on 
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principal components and referred to as functional principal regression (FPR), in Sec- 
tion 4.3. For the choice of the smoothing parameters for the various smoothing steps, 
we adopt leave-one-curve-out cross-validation (Rice and Silverman (1991)). Smoothing is 
implemented by local linear fitting for functions and surfaces (Fan and Gijbcls (1996)), 
minimizing locally weighted least squares. 

In a pre-processing step, all observed process data arc centered by subtracting the 
cross-sectional means Xi{sj) — -J27=i-^ii^j)' ^^'^ analogously for Yi. If the data are 
not sampled on the same grid for different individuals, a smoothing step may be added 
before the cross-sectional average is obtained. As in the previous sections, we use in the 
following the notation X,Y,Xi,Yi to denote centered processes and trajectories. 

When employing the Karhunen-Loeve decomposition (7), we approximate observed 
centered processes by the fitted versions 

L L 

^.W=^e,;^,(s), Y,{t)=J2Qmit), (20) 
1=1 1=1 

where {di{s)}f^^ and {0i{t)}f'^i are the estimated first L smoothed cigcnfunctions for the 
random processes X and respectively, with the corresponding estimated cigenscores 
{^ii}iLi and {Cii}iLi for the ith subject. We obtain these estimates as described in Yao et 
al. (2005a). Related estimation approaches, such as those of Rice and Silverman (1991) 
or Ramsay and Silverman (2005), could alternatively be used. 



4.2. Functional canonical regression (FCR) 

To obtain functional canonical correlations and the corresponding weight functions as 
needed for FCR, we adopt one of the methods proposed in He et al. (2004). In preliminary 
studies, we determined that the eigenbase method as described there yielded the best 
performance for regression applications, with the Fourier base method a close second. 
Adopting the eigenbase method, the implementation of FCR is as follows: 

(i) Starting with the eigenscore estimates as in (20), estimated raw functional canon- 
ical correlations pi and L-dimensional weight vectors u;, v;, Z = 1, . . . , L, are obtained 
by applying conventional numerical procedures of multivariate canonical analysis to the 
estimated eigenscore vectors (^ii , . . . , Cil)' and {(n , . . . , Cil)'- This works empirically well 
for moderately sized values of L, as typically obtained from automatic selectors. 

(ii) Smooth weight function estimates ui{t),vi{t) are then obtained as 

uiit)^uieit), ii{t)^viip{t), 

where e{t) = {9,{t), . . . , ^^(i))', m = ((^i(t), . . . , (^^(i))'. 

(iii) The estimated regression parameter function j3 is obtained according to (16) by 

P{s,t) =^piui{s) / fYY{s,t)vi{s)ds, 
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where tyy is an estimate of the covariance function of Y , obtained by two-dimensional 
smoothing of the empirical autocovariances of Y . This estimate is obtained as described 
in Yao et al. (2005a). Since the data are regularly sampled, the above integrals are 
easily obtained by the approximations X^Jl^i f^Yvitj ,t)vi{tj)(tj — tj^i),l = 1, . . . , L, with 
to defined analogously to sq in (22) below, 
(iv) Fitted/predicted processes 

Ytit)^ [ ${s,t)X,{s)ds for i = l,...,n, (21) 

are obtained, where the integral is again evaluated numerically by 

Mt) = £/3(s„i)^.(sj)(s, - s,-i). (22) 

Here, sq is chosen such that si — sq = S2 — si. 

This procedure depends on two tuning parameters, a bandwidth h for the smoothing 
steps (which are defined in detail, e.g., in Yao et al. (2005a)) and the number of canonical 
components L that are included. These tuning parameters may be determined by leave- 
one-out cross-validation (Rice and Silverman (1991)) as follows. With a = {h,L), the ith 
leave-one-out estimate for /? is 

^i"^ = E Pi~'^<:P i')^YY it) for * = 1, ... , n, (23) 

1=1 

where p[ is the lih canonical correlation, and {t|j and RyY^i. ^'"^ weight 
function untransformed and transformed with the covariance operator, respectively, all 
obtained while leaving out the data for the ith subject. Computation of these estimates 
follows steps (iii) and (iv) above, using tuning parameter a = {h,L)^ and omitting the 
ith pair of observed curves (Xi,Yi). The average leave-one-out squared prediction error 
is then 

= ^E/, (^'W - X X,{s)$t'\s,t)ds^ dt. (24) 

The cross-validation procedure then selects the tuning parameter that minimizes the 
approximate average prediction error, 

a = arg min PEa , 

a 

where PEa is obtained by replacing the integrals on the right-hand side of (24) by sums 
of the type (22). 
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4.3. Functional principal component regression (FPR) 

Yao et al. (2005b) considered an implementation of functional linear regression whereby 
one uses functional principal component analysis for predictor and response functions 
separately, followed by simple linear regressions of the response principal component 
scores on the predictor scores. Wc adopt this approach as FPR. 

Briefly, defining a^p ~ E{^mCp): this approach is based on representations 

oo oo 

of the regression parameter function (3(s,t), where 

Omp= j / 9rn{s)rxY{s,t)Lpp{t)Asdt (25) 

for all m and p. 

For estimation, one first obtains a smooth estimate fxY of the cross-covariance rxY 
by smoothing sample cross-covariances, for example, by the method described in Yao et 
al. (2005b). This leads to estimates amp of amp, 1 < rn,p < L, by plugging in estimates 
fxY for rxY and 9i,ipi for eigenfunctions 0i,ipi (as described in Section 4.2), in combi- 
nation with approximating the integrals in (25) by appropriate sums. One may then use 
these estimates in conjunction with estimates Xxm of eigenvalues Xxm to arrive at the 
estimate /? of the regression parameter function /3(s,t) given by 

m=lp=l Xxm 

For further details about numerical implementations, we refer to Yao et al. (2005b). 



5. Application to medfiy mortality data 

In this section, we present an application to agc-at-death data that were collected for 
cohorts of male and female medflics in a biodemographic study of survival and mortality 
patterns of cohorts of male and female Mediterranean fruit files [Ceratitis capitata; for 
details, see Carey et al. (2002)). A point of interest in this study is the relation of mortality 
trajectories between male and female mcdfiics which were raised in the same cage. One 
specifically desires to quantify the infiuence of male survival on female survival. This is of 
interest because female survival determines the number of eggs laid and thus reproductive 
success of these files. Wc use a subsample of the data generated by this experiment, 
comprising 46 cages of medflies, to address these questions. Each cage contains both a 
male and a female cohort, consisting each of approximately 4000 male and 4000 female 
medfiies. These files were raised in the shared cage from the time of eclosion. For each 
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cohort, the number of flies alive at the beginning of each day was recorded, simply 
by counting the dead flies on each day; we confined the analysis to the first 40 days. 
The observed processes Xi{t) and Yiit), t~ l,...,40,i = 1,...,46, arc the estimated 
random hazard functions for male and female cohorts, respectively. All deaths arc fully 
observed so that censoring is not an issue. In a pre-processing step, cohort-specific hazard 
functions were estimated nonparametrically from the lifetable data, implementing the 
transformation approach described in Miiller et al. (1997a). 

A functional linear model was used to study the specific influence of male mortality 
on female mortality for flies that were raised in the same cage, with the hazard function 
of males as predictor process and that of females as response process. We applied both 
the proposed regression via canonical representation (FCR) and the more conventional 
functional regression based on principal components (FPR), implementing the estimation 
procedures described in the previous section. Tuning parameters were selected by cross- 
validation. Table 1 lists the average squared prediction error (PE) (24) obtained by the 
leave-one-out technique. For this application, the FCR procedure is seen to perform about 
20% better than FPR in terms of PE. 

The estimated regression parameter surface /3(s,t) that is obtained for the FCR re- 
gression when choosing the cross- validated values for h and i, as given in Tabic 1, is 
shown in Figure 1. The shape of the regression surface indicates that female mortality 
at later ages is very clearly affected by male mortality throughout male lifespan, while 
female mortality at very early ages is not much influenced by male mortality. The effect 
of male mortality on female mortality is periodically elevated, as evidenced by the bumps 
visible in the surface. The particularly influential predictive periods are male mortality 
around days 10 and 20, which then has a particularly large influence on female mortality 
around days 15 and 25, that is, about flve days later, and, again, around days 35 and 40, 
judging from the locations of the peaks in the surface of /3(s,t). In contrast, enhanced 
male mortality around day 30 leads to lessened female mortality throughout, while en- 
hanced male mortality at age 40 is associated with higher older-age female mortality. 
These observations point to the existence of periodic waves of mortality, first affecting 
males and subsequently females. While some of the waves of increased male mortality 
tend to be associated with subsequently increased female mortality, others are associated 
with subsequently decreased female mortality. 

These waves of mortality might be related to the so-called "vulnerable periods" that are 
characterized by locally heightened mortality (Miiller et al. (1997b)). One such vulnerable 

Table 1. Results for medfly data, comparing functional canonical regression (FCR) and func- 
tional principal component regression (FPR) with regard to average leave-one-out squared pre- 
diction error (PE) (24); values for bandwidth h and number L of components as chosen by 
cross-validation are also shown 





h 


L 


PE 


FCR 


1.92 


3 


0.0100 


FPR 


1.65 


3 


0.0121 
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Figure 1. Estimated regression parameter surface obtained by functional canonical regression 
for the medfly study. 

period occurs around ages 10 and 20, and the analysis suggests that heightened male 
mortality during these phases is indicative of heightened female mortality. In contrast, 
heightened male mortality during a non-vulnerable period such as the time around 30 
days seems to be associated with lower female mortality. A word of caution is in order 
as no inference methods are available to establish that the bumps observed in /3(s, t) are 
real, so one cannot exclude the possibility that these bumps are enhanced by random 
fluctuations in the data. 

Examples of observed, as well as predicted, female mortality trajectories for three 
randomly selected pairs of cohorts (male and female flies raised in the same cages) are 
displayed in Figure 2. The predicted female trajectories were constructed by applying 
both regression methods (FCR and FPR) with the leave-one-out technique. The predic- 
tion of an individual response trajectory from a predictor trajectory cannot, of course, 
be expected to be very close to the actually observed response trajectory, due to the 
extra random variation that is a large inherent component of response variability; this is 
analogous to the situation of predicting an individual response in the well-known simple 
linear regression case. Nevertheless, overall, FCR predictions are found to be closer to 
the target. 

We note the presence of a "shoulder" at around day 20 for the three female mortality 
curves. This "shoulder" is related to the wave phenomenon visible in ${s,t) as discussed 
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Observed temale Irajectory 

- - - FCR procedure 
- - FPR procedure 



Figure 2. Functional regression of female (response) on male (predictor) medfly trajectories 
quantifying mortality in the form of cohort hazard functions for three cages of flies. Shown are 
actually observed female trajectories (solid) that are not used in the prediction, as well as the 
predictions for these trajectories obtained through estimation procedures based on functional 
principal component regression (FPR) (dash-dot) and on functional canonical regression (FCR) 
(dashed). 

above and corresponds to a phase of elevated female mortality. The functional regression 
method based on FCR correctly predicts the shoulder effect and its overall shape in 
female mortality. At the rightmost points, for ages near 40 days, the variability of the 
mortality trajectories becomes large, posing extra difficulties for prediction in the right 
tail of the trajectories. 

6. Additional results 

Theorems 6.3 and 6.4 in this section provide a functional analog to the sums-of-squares 
decomposition of classical regression analysis. In addition, we provide two results charac- 
terizing the regression operators Cx ■ We begin with two auxiliary results which are taken 
from He et al. (2003). The first of these characterizes the correlation operator between 
processes X and Y . 

Lemma 6.1. Assume that the L2-processes X and Y satisfy Condition (C2). The corre- 

"1^ /2 1/2 

lation operator Rxx ^xyRyy ''^'^ then he extended continuously to a Hilbert- Schmidt 
operator R on £2(12) to L2{Ti). Hence, Rq = R* R is also a Hilbert- Schmidt oper- 
ator with a countable number of non-zero eigenvalues and eigenf unctions {{Xm,(lm)}, 
m > 1, Xi > X2 > ■ ■ ■ , Pm = Rq-ni/ s/K'i- Then: 

(a) pm ~ V A„i , u„i = R^^^Prrn v„i = RyY^Qm md both Urn OL^d u,„ are L2-functions; 

(b) corr(C/m, Uj) {u,n,RxxUj) = {pm,Pj) = Smj; 

(c) corr(K„, Vj) = {vm, RxxVj) = {qm, qj) = 5m] ; 

(d) COrr(C/m, Vj) = {Um,RxXVj) = {Pm,Rqj) = PmSm.] ■ 
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One of the main results in He et al. (2003) reveals that the L2-processes X and Y can 
be expressed as sums of uncorrelated component functions and the correlation between 
the mth components of the expansion is the mth corresponding functional canonical 
correlation between the two processes. 

Lemma 6.2 (Canonical decomposition). Assume L2-processes X and Y satisfy Con- 
dition (C2). There then exists a decomposition: 

(a) 



X = x,j, + x^K , y=yc.K+ y^\k , 



where 



K 



Xc,K = '^^UjRxxUj, X^j^ — X — Xc 



K 



Y,.K = J2 ^iRyy^j^ ^y-Yc 



K- 



The index K stands for canonical decomposition with K components, and Uj , Vj , Uj , Vj 
are as in Definition 3.1. Here, {X,Y) and {Xc.k,Yc^k) share the same first K canonical 
components, and {Xc^k ,Yc^k) o-nd {X^j^,Y^j^) are uncorrelated, that is, 

corr(Xe,K , = 0, coYr{Y,,^K,Y^\K) = 0, 

cotv{X,,^k,Y^^k) - 0, corr(y,,K, = 0. 

(b) LetK — >■ OO and Xc^oo — X/?ri=l ^rnf^XXUm , ^,00 — X/m = l ^mRYYVm- Then 

where X^^^ = X- Xc,oo, ^c|oo = ^c^oo- Here, (Xc,oo, ^c^oo) and {X,Y) share the 
same canonical components, corr(X^^, F^^^^) = 0, and {X^^,Yi^^) and (Xc.oo, ic,oo) 
are uncorrelated. Moreover, X^^ = if {Pm,'m' ^1} forms a basis of the closure of the 
domain of Rxx o.nd Y^^ ^ if {qm, > 1} forms a basis of the closure of the domain 
of Ryy ■ 

Since the covariance operators of L2-processes are non-negative self-adjoint, they can 
be ordered as follows. The definitions of Y* ,Y^,Yc.oo are in (17), (18) and Lemma 6.2(b), 
respectively. 

Theorem 6.3. For K > 1, Ry^y* < Ryy < Ry,_^Y,,^ < Ryy- 

In multiple regression analysis, the ordering of the operators in Theorem 6.3 is related 
to the ordering of regression models in terms of a notion analogous to the regression sum 
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of squares (SSR). The canonical regression decomposition provides information about 
the model in terms of its canonical components. Our next result describes the canonical 
correlations between observed and fitted processes. This provides an extension of the 
coefficient of multiple determination, B? = corr(y, y), an important quantity in classical 
multiple regression analysis, to the functional case; compare also Yao et al. (2005b). 

Theorem 6.4. Assume that L2-processes X and Y satisfy Condition (C2). The canoni- 
cal correlations and weight functions for the pair of observed and fitted response processes 
(Y^Y*) are then {{pm,VrmVm/ Pm)]m> 1} and the corresponding K -component (or oo- 
component) canonical decomposition for Y* , as defined in Lemma 6.2 for K >\ and 
denoted here by Y*j^ (or Y*^), is equivalent to the process Y^ or Y* given in Theorem 
3.4, that is, 

K 00 
ylK = yK=Y.P"-^"-^YYV,n. K>1, F,%=y*= ^p™!7„,i?yy«,„. (26) 

rn—1 m—l 

We note that if F is a scalar, then — pi, and for a functional response Y, is 
replaced by the set {pm, m > 1}. 

The following two results serve to characterize the regression operator Cx defined in 
(4). They are used in the proofs provided in the following section. 

Proposition 6.5. The adjoint operator of Cx is C\ '■ L2{T2) — > ^2(^1 x T2), where 
{C\z){s,t) = X{s)z{t) forzeL2iT2). 

We have the following relation between the correlation operator Txx defined in (5) 
and the regression operator Cx ■ 

Proposition 6.6. The operator Txx is a self-adjoint non-negative Hilbert-Schmidt op- 
erator and satisfies Txx = E![CxCx]- 

7. Proofs 

In this section, we provide sketches of proofs and some auxiliary results. We use tensor 
notation to define an operator 9 (E) ^p: H ^ H, 

{e®ip){h)^{h,9)if for /leiJ. 

Proof of Proposition 2.2. To prove (a) (b). we multiply equation (4) by X on 
both sides and take expected values to obtain E{XY) = E{XCxPo) + E{X£). Equation 
(6) then follows from E(XY) = rxY, E{XCxPo) = TxxPo (by Propositions 6.5 and 6.6) 
and E{X£) = 0. 
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For (b) => (c), let /3o be a solution of equation (6). For any /3 G L2(T'i x T2), we then 
have E\\Y- = E\\Y - CxM^ + E\\CxWo - + - CxPo^CxWo - /?))]• 

Since 

E{Y-Cxf3o,Cx{f3o~l3)) 
^E{C*xY~C*xCxPo,Po~P) 

= {E[C*xY] - E[C*^CxPo],l3o - l3) = {rxY - rxxl3o,l3o ~I3)=0, 
by Proposition 6.6, we then have 

E\\Y - Hx^f = E\\Y- Cxl3o\\^ + E\\Cx{l3o -/?)!!'> - Cxl3o\\\ 

which implies that (3o is indeed a minimizer of — CxPW^- 
For (c) (a), let 

d^=E\\Y~CxPo\\^= min E\\Y^Cxf3f. 

/3eL2(TixT2) 

Then, for any /3 e ^2(^1 x T2), a e R, 

^E\\Y^ CxPof <E\\Y- CxWo + aP)f 

^E\\Y- Cxf3of - 2E{Y - CxPo,Cx{aP)) + E\\Cx{am^ 

= S- 2a{E[X{Y - CxPo)],P) + a^E\\CxP\?. 

C\vooBmga^{E[X{Y-CxMlP)lE\\CxPf, it follows that \{E[X{Y - CxPo)^^)? / 
E\\CxP\\^<Q and {E[X{Y - CxPo)], P) = Since /3 is arbitrary, E[X{Y ~ CxPo)]^Q 
and therefore /3o satisfies the functional linear model (4). □ 

Proof of Theorem 2.3. Note, first, that rxy(s,t) Em j ■^K™0]6'm(s)<^j (*)• Thus, 
Condition (CI) is equivalent to rxY € Gxx- Suppose that a unique solution of (4) ex- 
ists in ker(rxx)''"- This solution is then also a solution of (6), by Proposition 2.2(b). 
Therefore, rxY € Gxx, which implies (CI). On the other hand, if (CI) holds, then 
rxY e Gxx, which implies that T^^x'^xy = J2m. ^Xmi'^XY ,0ni^j)Omipj is a solution of 
(6), is in ker(rxx)^ and, therefore, is the unique solution in kcr(rxjs:)^ and also the 
unique solution of (4) in ker(rxjs:)^. □ 

Proof of Proposition 2.4. The equivalence of (a), (b) and (c) follows from Proposition 
2.2 and (d) (b) is a consequence of Theorem 2.3. We now prove (b) (d). Let 
/3o be a solution of (6). Proposition 2.2 and Theorem 2.3 imply that both (3o and /3q 
minimize E\\Y~Cxf3\\^ for /3 G ^2(^1 x T2). Hence, E\\Y - CxM^ = E\\Y - CxP^W^ + 
E\\Cx{f3o- PaW + 2E{Y - Cx13o,Cx{I3q - /3o)), which, by Proposition 6.6, implies that 
2E{C*x{Y - CxP*o),f3^ - M = 2{rxY - rxxl3^,f3^ ~ /3o) = 0. Therefore, E\\CxW^ " 
= l|rxx(/3o* - = 0. It follows that f3^ - /3o G kcr(rxx), or f3o = f3^ + h, for an 

/iGker(rxx). □ 
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Proof of Theorem 3.2. According to Lemma 6.2(b), Condition (C2) guarantees the 
existence of the canonical components and canonical decomposition of X and Y . More- 
over, 

rxy(s,t) = ii;[x(s)r(t)] = ii;[(x,,oo(s) + x,-^„o(.))(rc,oo(i) + i^ctooW)] 



UmRxXUmis) ^ VmRYYVm{t) 



m—1 



m— 1 



= ^ E[U„iVj]RxxUra(.s)RYYVm{t)=^^ PmRxXUm(s)RYYVra{t)- 
m.j — 1 m — 1 

We now show that the exchange of the expectation with the summation above is valid. 
From Lemma 6.1(b), for any K > and the spectral decomposition Rxx ~ X^m ^Xm^^m® 



K 



K 



m—1 
K 



K 



m—1 j — 1 J — 1 \m— 1 / 

oo oo 

<E^^^ii^^ii'=E^^^<°°' 

where the inequality follows from the fact that X]m=i(P™' ^i)^ ^^"^ square length of 
the projection of 6j onto the linear subspace spanned by {pi, . . . ,pk}- Similarly, we can 
show that for any K > 



K 



Y E\\V,nRYYn,nf <Y^Yj 
m—1 J — 1 



□ 



Proof of Theorem 3.3. Note that Condition (C2) implies Condition (CI). Hence, from 
Theorem 2.3, /3q = L^^^rxy exists and is unique in ker(rxjf)'^. We can show (16) by 
applying ^'^ both sides of (6), exchanging the order of summation and integration. 
To establish (17), it remains to show that 



oo 

Y, WPmUmRYYVmW'^ < OO, 
m—1 



(27) 
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00 ^ 

— PmRxxPm = R^xxRq-m = ^ pr— {R<lm-, 



where the operator R = rI^^ Rxv Ryy defined in Lemma 6.1 and can be written as 
R = J2k t ''^kiVk^Oi, with rkrn ~ E[^kCe]/ V'^xk^Yi, using the Karhunen-Loeve expansion 
(7). Then, 



Rqm = ^rki{(pk,qm)de, {Rq„i,dj) = ^rkj{(pk,q; 

k,e k 

and, therefore, 

rn II 

m 

< ^ \\p,nU,nf\\RYYV,nf = E 

771 77] 

^E Ea^E^'.^ E^^^''?™) 
= EtttE^'.^- E E(^^'* 



RyyVt, 



„2 



RyYViuW'^ 

WRyyv^W^ 



as j|q,„|| = 1. 



j,k 



Note that by (C2), the first sum on the right-hand side is bounded. For the second sum, 

^ 1 1 RYYVrn I P = E ' ' ^YY Irn I P = E ' ^™ ) = E E "^^J' ' "^J' ^ ^ 



m J 



which imphes (27). 

Proof of Theorem 3.4. Observing 

K 



□ 



Yk = -Cx/Jj: = ^ PmCx{u,n)RYYVn 
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Pvi{Um,X)RYYVm = ^ PmUmRYYVm, 



K 



K 



E\\Y* - Y^^W^ = E 
E\\CxPk\\^ = E 



m=K+l 



m—1 
2 



m=A'+l 



and 



PmPjE[UrnUj]{RYYVm, RyyVj) = ^ p,^„ 1 1 i?y y i;„i | ^ < OO, 



771 — 1 



we infer that - Y^W^ ^ as /\ oo. From E[U,n] = 0, for m > 1, we have E[Y^] 

and, moreover, 



E\\Y~ Y^f = E\\ {Y - CxPo) + CxWo - Pk)^ 

^E\\Y- Cxl3*o\\^ + E\\Cx{l3*o - P*kW + 2E{Y - Cxf3*o,Cx{(3*o - P*k)). 

Since E\\Y — £x/3olP = trace(i?yy) — _E||£x/3o IP ^'^'^ ^'^ ^^e solution of the normal 
equation (6), we obtain E{Y - Cxl3*o,CxW^ - I3*k)) ^ E{C*xiY - CxP^),^^ - I3*k) ^0. 
Likewise, 



E\\CxW*o - f3*K)f = E plWRyy'^' 

m=K+l 

implying (19). 

Proof of Theorem 6.3. From (17), (18) for any K>1, 



mil 7 



□ 



Ry'Y' — Ry'Y; — Ryy 



E Plilyn® qr,i 
.m=K+l 



JTyy — J^YY K+l^K+^^YY ^ 



where Rk+i = ^^o]spa.x,{q^,Tn>K+i} R and hence, i?y.y. - i?y^y^ > 0. Note that 

OO 

(s,<)=-B[r,,oo(s)Fc,oo(i)]= E -E[K«V^,]i?yyVm(s)-Ryyw,(i) 



E RYYVm{s)RYYVj(t) = ^ ^yy ('7m)(s)fiyy (9m)(i)j 



implying that 



^y. 



Hy'Y* — ^YY 



> 0. 
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Finally, from Lemma 6.2(b), we have Y = Yc, oo — ^c~ooi therefore ryy = fy^ „ 
ry± y± . This leads to ryy — ry^ ^y^ ^ = ry± y± and Ryy — Ry^ ^y^ ^ = R 



0. 



□ 



We need the following auxiliary result to prove Theorem 6.3. We call two L2-processes 
X and Y uncorrelated if and only if E[{u^X){v^Y)] = for all i2-functions u and v. 

Lemma 7.1. Y^^ and Y* are uncorrelated. 

1 /2 

Proof. For any u, 5 S ^2(12), write v ~ iii + V2, with Ryy vi G spanj^m; m > 1}, which 
is equivalent to vi G span{t;,„; m > 1} and Ryy V2 G span{(7„i; m > 1}^. Then 



{V2 



Y*) = ^ PmUm{v2,RYYVm) = ^ PmUm{RYYV2, Qm) = 0. 



Withal =J2m^rnVTn,^'rite {v,Y*) = {vi,Y*) =J2ni.j "■rnPjUj{Vm,RYYVj) =Y.ra"^mPraU„ 

Furthermore, from Lemma 6.2(b), E\Um{u,Y^^)] = for all m > 1. We conclude that 

E[{u,Y,^^){i,Y*)]=0. ' □ 

Proof of Theorem 6.4. Calculating the covariance operators for {Y,Y*), 

ry.y,{s,t) = E[Y*{s)Y*{t)] = Y,PmPjE[U^Uj]RyYU^{s)RyyVj{t) 

= '^PmRYYUm{s)RYYVm{t) pl^RyYqrn(s)RYyqm{t) 
m m 

SO that 

Ryy = ^ PmRyY fern ® qrnjRyy = Ryy ^ PrnQm 

Now, from Lemmas 6.2 and 7.1, 



r^y. {s,t)^ E[Y{s)Y* (t)] = E[{Y,.^{s) + Y^^^ {s))Y* (t)] 



VniRyyVmis) ^ PjUjRyyVj{t) 



= E[Y,^^is)Y*{t)]=E 

= ^E[VjnUjPjRyyVmis)RyyVj{t)] 

= '^ptnRYYVyn{s)RyyVj{t) ^ryy{s,t) 
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Hence, Ryv =Ry-'Y'- The eorrelation operator for (Y^Y*) is R = Ryy^ Ryy Ryly* = 
Ryy^ Ry^Y' '^ith RR* = Ryy^ Ry'Y* Ryy'^ ~ Ra- Hence, /5,„ = pm, Pm = Im and 

~ \ I '2, 1/2 1/2 —\ I '2. 

q,n = R*P,n/Pm = Ry'Y'RyY Im/Pm = RY.Y*'"rn. I Pm- MorCOVer, W,„ = Ryy Pm. = 

i?yy = Vra and Vm = Ry^Y'^m = Ryly, RY*y*Vm/ Pm = Vm/Pm.- Note that Y*^ = 

J2ni^rnRY-Y-Vm with 

Vm. = {v,n,Y*) = (v„Jp,n,'^PjUjRyyvA = '^Uj{v,n, RyyVj) =U„i, 

j j 

Ry,y,Vrn ~ Ryy RoRyyVm / Pm = RyyRoqm/Pm — PmRyy-Qm — PmRYY^m- 

Substituting into the equation on the left-hand side of (26), one obtains the equation on 
the right-hand side of (26). □ 

Proof of Proposition 6.5. From the definition, C*x must satisfy {CxP, z) — {f3,Cxz) 
for ^ e L2(Ti X T2) and z G ^2(^2). Note that {Cxl3, z) = J^^{Cxl3)it)z{t)dt = 
Jr^^Jr^^X(s)l3{s,t)z{t)dsdt and {l3,C*xz) ^ J Jj,^^r^^l3{s,t){C*xz){s,t)dsdt. For the dif- 
ferences, we obtain / / P{s,t)[X{s)z{t) — {Cxz)ls,t)] dsdt = for arbitrary /3 e L2{Ti x 
T2) and z e £2(^2). This implies that {C*xz){s,t) = X{s)z{t). □ 

Proof of Proposition 6.6. By Proposition 6.5, Txx = E[CxCx]- Since the integral 
operator Txx has the L2-integral kernel rxx, it is a Hilbert-Schmidt operator (Conway 
(1985)). Moreover, for /3i,/32 G ^2(^1 x T2), 



{TxxPuP2)=J j {TxxPi){s,t)fi2{s,t)dsdt = j j j rxx{s,w)Piiw,t)P2{s,t)dwdsdt, 
(/3i,rxx/32) = / //3i(s,i)(rxx/32(s,t))dsdt= / / [ Piiw,t)rxx{s,w)(32{s,t)dwdsdt, 



implying that Txx is self-adjoint. Furthermore, Txx is non-negative definite because, 
for arbitrary P e L2{Ti x T2) , 



{Txxli,l3)= I J J E[X{s)X{w)]l3{w,t)l3{s,t)dwdsdt 

J{Cxm)i^xm)dt =E\\CxP\\^>0. □ 
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